Forecasting with Exponential Smoothing Models

Oran Gradute School of Economics

Author

Dr. BENGHALEM Abdelhadi

1 Introduction to Exponential Smoothing

Developed in the late 1950s by Brown, Holt, and Winters., exponential smoothing methods remain among the most practical forecasting tools.

These methods create forecasts as exponentially weighted averages of past observations, giving more weight to recent data.

Note

Exponential smoothing methods generate forecasts by calculating a weighted average of past observations, where the weights assigned to older data decrease exponentially.

This approach emphasizes recent data more heavily than older data, reflecting the assumption that recent observations are more relevant for predicting future values

1.1 Exponential Smoothing Models:

  • Simple Exponential Smoothing (SES): Suitable for series without trends or seasonal patterns, SES assumes a constant local mean over time. It uses all past observations with exponentially decreasing weights for older observations, set by the smoothing parameter \(( \alpha )\)

  • Holt’s Linear Trend Model: Extends SES to accommodate data with trends by adding a trend component, which is updated over time.

  • Holt-Winters Seasonal Model: Further extends Holt’s model to include a seasonal component, making it ideal for data with both trends and seasonality.

Choose Model Based On: SES Holt’s Holt-Winters
Trend? ❌ No ✔️ Yes ✔️ Yes
Seasonality? ❌ No ❌ No ✔️ Yes
Complexity Low Moderate High

2 Simple Exponential Smoothing (SES)

Best for: Data without clear trends or seasonality (e.g., Algeria exports data below).

🫵 Using the algeria_economy dataset, create a plot to visualize Algeria’s exports as a percentage of GDP over time. Interpret the trend—what patterns or fluctuations do you observe, and what economic factors might explain them?

Code
algeria_economy <- global_economy |> 
  filter(Country == "Algeria")

algeria_economy |> 
  autoplot(Exports) +
  labs(y = "% of GDP", title = "Exports: Algeria")

✔️ High Volatility: Algeria’s exports (% of GDP) show significant fluctuations 📉📈, reflecting economic instability.

🛢️ Oil Dependency: Peaks align with oil booms 💰, while drops coincide with global oil crises and policy shifts ⚠️.

Simple Exponential Smoothing (SES) : The SES model forecasts future values using a weighted average where weights decrease exponentially as observations get older. The weights are determined by the smoothing parameter \(( \alpha )\), which lies between 0 and 1. The forecast equation is:

\(\hat{Y}_{T+1} = \alpha Y_T + (1 - \alpha) \hat{Y}_T\)

📌 Where:
\(( \hat{Y}_{T+1} )\) → Forecast for next period 📈
\(( Y_T )\) → Actual observation at time ( T ) 📊
\(( \hat{Y}_T )\) → Previous forecast value ⏳
\(( \alpha )\) → Smoothing parameter \((0 \leq \alpha \leq 1)\) ⚙️

🔑 Key Insights:

✔️ Simplest model in the exponential smoothing family 🎯
✔️ Assumes a constant level (no trend or seasonality) 📉
✔️ Uses exponentially weighted past observations 🔄
✔️ Smoothing parameter \(( \alpha )\) controls how fast old data is discounted 🎚️

🔮 Forecast for ( T+1 ): Based on the most recent observation with exponentially decaying weights!

Let’s recall and compare :

Naïve Method

Forecasts assume the last observed value is the best predictor for all future periods: \[ \hat{y}_{T+h|T} = y_T \quad \text{for } h = 1, 2, \dots \]

📌 Key Properties:
Weight Allocation: 100% weight to most recent observation \((y_T)\)
Use Case: Stable series with no trend/seasonality
⚠️ Limitation: Ignores all historical data except last value


Average Method

Forecasts use the mean of all historical observations: \[ \hat{y}_{T+h|T} = \frac{1}{T} \sum_{t=1}^{T} y_t \quad \text{for } h = 1, 2, \dots \]

📌 Key Properties:
Weight Allocation: Equal weights \((1/T)\) to all observations
Use Case: Perfectly stationary data with mean-reverting behavior
⚠️ Limitation: Fails to adapt to trends or changing patterns


Simple Exponential Smoothing

Balances naïve and average methods using exponentially decaying weights

Forecasts are weighted averages where weights decay exponentially: \[ \hat{y}_{T+1|T} = \alpha y_T + \alpha(1-\alpha)y_{T-1} + \alpha(1-\alpha)^2 y_{T-2} + \cdots \] Where:
- \(0 \leq \alpha \leq 1\) (smoothing parameter). The rate at which the weights decrease is controlled by the parameter $\alpha$
- Weights decay geometrically: \(\alpha(1-\alpha)^j\) for \(j=0,1,2,\dots\)

Weight Distribution Examples:

\(\alpha\) Weight on \(y_T\) Weight on \(y_{T-1}\) Weight on \(y_{T-2}\) Sum ≈ 1?
0.2 0.2 0.16 0.128 Yes
0.5 0.5 0.25 0.125 Yes
0.8 0.8 0.16 0.032 Yes

📌 Key Advantages:
✅ Adapts to recent trends better than naïve/average methods
✅ Flexible weighting via \(\alpha\):
- \(\alpha \to 1\) ≈ Naïve method (high reactivity)
- \(\alpha \to 0\) ≈ Average method (strong smoothing)
✅ Computationally efficient


2.1 Practical Exercise

The company has accumulated the demand data in the following table for its computers for the past 12 months.

  • a.Compute an exponentially smoothed forecast, using an α value of 0.3

  • b. Compute an exponentially smoothed forecast, using an α value of 0.5

Month Orders Delivered per Month
January 37
February 40
March 41
April 37
May 45
June 50
July 43
August 47
September 56
October 52
November 55
December 54

Solution

Initial ​ value: 37 (assuming it’s the first value in the data)

  • Compute an exponentially smoothed forecast, using an α value of 0.3
Month \(D_t\) \(F_t = \alpha(0.30)\)
January 37 -
February 40 \(0.30 (37) + 0.70 (37) = 37\)
March 41 \(0.30 (40) + 0.70 (37) = 37.9\)
April 37 \(0.30 (41) + 0.70 (37.9) = 38.83\)
May 45 \(0.30(37) + 0.70 (38.83) = 38.28\)
June 50 \(0.30(45) + 0.70(38.28) = 40.296\)
July 43 \(0.30(50) + 0.70(40.296) = 43.21\)
August 47 \(0.30(43) + 0.70(43.21) = 43.15\)
September 56 \(0.30(47) + 0.70(43.15) = 44.30\)
October 52 \(0.30(56) + 0.70(44.30) = 47.81\)
November 55 \(0.30(52) + 0.70(47.81) = 49.06\)
December 54 \(0.30(55) + 0.70(49.06) = 50.84\)
January - \(0.30(54) + 0.70 (50.84) = 51.79\)

b. Compute an exponentially smoothed forecast, using an α value of 0.5

Month \(D_t\) \(F_t = \alpha(0.50)\)
January 37 -
February 40 \(0.50 (37) + 0.50 (37) = 37\)
March 41 \(0.50 (40) + 0.50 (37) = 38.5\)
April 37 \(0.50 (41) + 0.50 (38.5) = 39.75\)
May 45 \(0.50(37) + 0.50 (39.75) = 38.875\)
June 50 \(0.50(45) + 0.50(38.875) = 41.9375\)
July 43 \(0.50(50) + 0.50(41.9375) = 45.96875\)
August 47 \(0.50(43) + 0.50(45.96875) = 44.484375\)
September 56 \(0.50(47) + 0.50(44.484375) = 47.7421875\)
October 52 \(0.50(56) + 0.50(47.7421875) = 51.37109375\)
November 55 \(0.50(52) + 0.50(51.37109375) = 51.685546875\)
December 54 \(0.50(55) + 0.50(51.685546875) = 52.8427734375\)
January - \(0.50(54) + 0.50 (52.8427734375) = 53.42138671875\)
Code
# Load the forecast package
library(forecast)
library(TSstudio)

# Original time series data
original_series <- c(37, 40, 41, 37, 45, 50, 43, 47, 56, 52, 55, 54)

# Create time series object
ts_data <- ts(original_series)

# Exponential smoothing forecasts with alpha = 0.3
forecast_03 <- forecast(ses(ts_data, alpha = 0.3), h = 1)

# Exponential smoothing forecasts with alpha = 0.5
forecast_05 <- forecast(ses(ts_data, alpha = 0.5), h = 1)
Code
forecast_03
   Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
13        51.8469 44.30635 59.38745 40.31462 63.37917
Code
forecast_05
   Point Forecast    Lo 80    Hi 80   Lo 95    Hi 95
13       53.60739 46.99172 60.22305 43.4896 63.72517
Code
accuracy (forecast_03)
                   ME     RMSE      MAE     MPE     MAPE      MASE      ACF1
Training set 3.047576 5.371262 4.330679 5.50894 8.935072 0.9721932 0.1117925
Code
accuracy(forecast_05)
                   ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
Training set 2.458106 4.712451 3.743754 4.608617 7.902775 0.8404346 -0.1994535
Code
ex <- ses(ts_data, alpha = 0.3, h = 5)
Code
ex1 <- ses(ts_data, alpha =0.5, h = 5)
Code
library(ggplot2)
# Plot the original series and fitted values using autoplot
autoplot(ts_data) +
  autolayer(ex$fitted, series = "Alpha = 0.3") +
  autolayer(ex1$fitted, series = "Alpha = 0.5") +
  ylab("Value") +
  xlab("Month") +
  labs(title = "Original Series vs. Fitted Values")

In the context of forecasting, exponential smoothing is a technique used to predict future values based on past observations. The smoothing constant α (alpha) plays a critical role in this method. It determines how much weight is given to recent data points versus older ones.

  • When α is close to 1, the forecast is highly influenced by recent observations, making it sensitive to short-term fluctuations.

  • When α is close to 0, the forecast relies more on historical data, resulting in a smoother prediction.

Adjusting \(α\) based on demand trends is essential for achieving accurate forecasts. If the demand is stable, a smaller α is preferable. Conversely, if the demand exhibits significant fluctuations, a larger α is recommended.

Remember that choosing the right α value requires a balance between responsiveness to recent changes and stability in the forecast. 📈

Code
autoplot(ex) +
  autolayer(fitted(ex), series="Fitted") +
  ylab("Oil (millions of tonnes)") + xlab("Year")

Code
autoplot(ex1)+
  autolayer(fitted(ex1), series="Fitted") +
  ylab("Oil (millions of tonnes)") + xlab("Year")

Alternative : using plot_forecast

Code

Weighted Average Form

Forecasts combine the latest observation \((y_t)\) and prior forecast \((\hat{y}_{t|t-1})\) with smoothing parameter \(\alpha\): \[ \hat{y}_{T+1|T} = \alpha y_T + (1-\alpha)\hat{y}_{T|T-1} \]

The process has to start somewhere, so we let the first fitted value at time 1 be denoted by \(ℓ_0\) (which we will have to estimate). 

Recursive Expansion: \[ \begin{align*} \hat{y}_{2|1} &= \alpha y_1 + (1-\alpha)\ell_0 \\ \hat{y}_{3|2} &= \alpha y_2 + \alpha(1-\alpha)y_1 + (1-\alpha)^2\ell_0 \\ &\vdots \\ \hat{y}_{T+1|T} &= \sum_{j=0}^{T-1} \alpha(1-\alpha)^j y_{T-j} + (1-\alpha)^T\ell_0 \end{align*} \]

📌 Key Properties:
✅ Exponentially decaying weights: \(\alpha(1-\alpha)^j\)
✅ Initial level \((\ell_0)\) becomes negligible as \(T\) grows


Component Form

We have

\[ \hat{y}_{T+1|T} = \alpha y_T + (1-\alpha)\hat{y}_{T|T-1} \]

Decomposes forecasts into level component \((\ell_t)\):

Forecast Equation:
\[ \hat{y}_{t+h|t} = \ell_t \quad (h \geq 1) \]

Smoothing Equation:
\[ \ell_t = \alpha y_t + (1-\alpha)\ell_{t-1} \]

📌 Interpretation:
\(\ell_t\) = Estimated series level at time \(t\)
✅ Identical to weighted average form when \(\ell_t = \hat{y}_{t+1|t}\)
Foundation for advanced models (trend/seasonal components)


Flat Forecasts

All future predictions equal the final level estimate: \[ \hat{y}_{T+h|T} = \ell_T \quad \text{for } h = 2, 3, \dots \]

📌 When to Use:
✅ No trend/seasonality in data
✅ Short-term forecasts where trend is negligible
⚠️ Limitation: Fails for trending/seasonal series


Parameter Optimization

Parameters \((\alpha, \ell_0)\) are estimated by minimizing SSE: \[ \text{SSE} = \sum_{t=1}^T (y_t - \hat{y}_{t|t-1})^2 \]

Aspect Regression Exponential Smoothing
Parameter Linearity Linear in coefficients Non-linear in \((\alpha, \ell_0)\)non-linear minimisation problem
Solution Method Closed-form (matrix inversion) Numerical optimization (e.g., L-BFGS)
Initial Values Not applicable Critical for early forecasts

📌 Implementation Steps:
1. Define SSE function using recursive equations
2. Use optimization algorithms to find \(\alpha \in [0,1]\) and \(\ell_0\)
3. Validate with metrics like AIC or cross-validation

Why Not Just Use the Last Observation?

A flat forecast is not the same as repeating the last observation! SES smooths out noise by weighting all past observations (not just the most recent one). For example:

  • If the last observation is an outlier (e.g., a sudden spike), SES will dampen its impact using the smoothing parameter \(\alpha \in [0,1]\)

  • A flat forecast represents a smoothed average, not a naive “copy-paste” of the latest value.

2.2 Example: Algerian exports

In this example, simple exponential smoothing is applied to forecast exports of goods and services from Algeria.

Code
# Estimate parameters
fit <- algeria_economy |>
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))
fc <- fit |>
  forecast(h = 5)
Code
fc |>
  autoplot(algeria_economy) +
  geom_line(aes(y = .fitted), col="#D55E00",
            data = augment(fit)) +
  labs(y="% of GDP", title="Exports: Algeria") +
  guides(colour = "none")

✔️ Forecast Plot (2018–2022)

✔️ Fitted Values (1960–2017): One-step-ahead estimates plotted alongside actual data 📈.

✔️ Uncertainty in Forecasts: Future exports have high variability ⚠️.

✔️ Key Warning: Ignoring uncertainty in point forecasts interpretation can be misleading ❗.

3 Holt’s Linear Trend and Damped Trend Methods

3.1 Holt’s Linear Trend Method

Extends simple exponential smoothing to capture trends via level (\(\ell_t\)) and slope (\(b_t\)) components:

Forecast Equation:
\[ \hat{y}_{t+h|t} = \ell_t + h b_t \quad (h = 1, 2, \dots) \]

Level Equation:
\[ \ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + b_{t-1}) \]

Trend Equation:
\[ b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*)b_{t-1} \]

📌 Key Parameters:
- \(\alpha\): Smooths the level (reactivity to new observations)
- \(\beta^*\): Smooths the trend (reactivity to trend changes)

3.1.1 Example: Australian population

🫵🧪 Exercise: Plotting Population Trends

Using the global_economy dataset, generate a time series plot of Australia’s population (in millions) over time. (Convert the population to millions by dividing by 1,000,000.)

Code
aus_economy <- global_economy |>
  filter(Code == "AUS") |>
  mutate(Pop = Population / 1e6)

autoplot(aus_economy, Pop) +
  labs(y = "Millions", title = "Australian population")

We will apply Holt’s method to this series. ⬇️⬇️

Code
fit <- aus_economy |>
  model(
    AAN = ETS(Pop ~ error("A") + trend("A") + season("N"))
  )
fc <- fit |> forecast(h = 10)
Code
report(fit)
Series: Pop 
Model: ETS(A,A,N) 
  Smoothing parameters:
    alpha = 0.9999 
    beta  = 0.3266366 

  Initial states:
     l[0]      b[0]
 10.05414 0.2224818

  sigma^2:  0.0041

      AIC      AICc       BIC 
-76.98569 -75.83184 -66.68347 

Smoothing parameters:

  • α (alpha) = 0.9999:
    This is the level smoothing parameter. A value near 1 means the model gives strong weight to the most recent observations when updating the level.

  • β (beta) = 0.3266:
    This is the trend smoothing parameter. A moderate value like this indicates that the trend component is updated, but also retains some memory of the past.

  • Initial states: ℓ₀ = 10.05414: Initial level of the series.

  • b₀ = 0.22248: Initial growth (trend) per period.

  • σ² = 0.0041: Variance of residuals (used for prediction intervals).

Model selection criteria: AIC, AICc, BIC — useful for comparing models (lower is better).

4 Damped Trend Method

4.1 Why Damping?

Holt’s linear trend assumes trends continue indefinitely, often leading to over-forecasting for long horizons. The damped trend method introduces a damping parameter \(\phi\) to gradually reduce the trend’s impact.


4.2 Key Equations

Forecast Equation:
\[ \hat{y}_{t+h|t} = \ell_t + (\phi + \phi^2 + \dots + \phi^h) b_t \]

Level Equation:
\[ \ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1}) \]

Trend Equation:
\[ b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*)\phi b_{t-1} \]

📌 Parameters:
- \(\alpha\): Smooths the level
- \(\beta^*\): Smooths the trend
- \(\phi \in (0.8, 0.98)\): Dampens trend over time


4.3 How Damping Works

  • Short-term forecasts (\(h\) small): Trend dominates (similar to Holt’s method).
  • Long-term forecasts (\(h \to \infty\)): Trend decays to a flat line converging to:
    \[ \ell_T + \frac{\phi b_T}{1 - \phi} \]
  • Effect of \(\phi\):
    • \(\phi \to 1\): Behaves like Holt’s linear trend (minimal damping).
    • \(\phi \to 0.8\): usually minimum as the damping has a very strong effect for smaller valuesxample: Australian Population Forecast

4.3.1 📊 Forecast Comparison with and without Damping

We apply two ETS models to forecast Australian population:

  • Holt’s Linear Trend: ETS(A,A,N)

  • Damped Trend: ETS(A,Ad,N) with damping parameter ϕ = 0.9

Year Holt Forecast (Millions) Damped Forecast (Millions)
2018 24.97 24.97
2027 28.29 27.92

4.3.2 📌 Interpretation

  • Both models start at the same point in 2018, but diverge as the forecast horizon increases.

  • ✅ The damped trend model reduces the influence of the trend over time, resulting in slower forecast growth.

  • ✅ This is useful when the historical trend is unlikely to persist indefinitely — a common situation in population forecasts, economic indicators, etc.

Question

Why might damping be more realistic in population modeling?


4.4 Practical Considerations

  1. Parameter Estimation:
    • \(\phi\) is optimized alongside \(\alpha\) and \(\beta^*\) to minimize SSE.
  1. Why \(\phi \in [0.8, 0.98]\)?
    • Lower bound: Avoid excessive damping (trend vanishes too quickly).
    • Upper bound: Ensure damping effect is noticeable vs. Holt’s method.

4.5 When to Use?

  • Long-term forecasts: Avoid over-forecasting trends (e.g., population, product demand).
  • Unstable trends: When historical trends may not persist indefinitely.

 Now we will run a forecasts for years 2018–2032 generated from Holt’s linear trend method and the damped trend method.

Code
aus_economy |>
  model(
    `Holt's method` = ETS(Pop ~ error("A") +
                       trend("A") + season("N")),
    `Damped Holt's method` = ETS(Pop ~ error("A") +
                       trend("Ad", phi = 0.9) + season("N"))
  ) |>
  forecast(h = 15) |>
  autoplot(aus_economy, level = NULL) +
  labs(title = "Australian population",
       y = "Millions")+
  guides(colour = guide_legend(title = "Forecast"))

We have set the damping parameter to a relatively low number (ϕ=0.90)to exaggerate the effect of damping for comparison. Usually, we would estimate ϕ along with the other parameters. We have also used a rather large forecast horizon (h=15) to highlight the difference between a damped trend and a linear trend.

4.5.1 Example: Internet usage

4.5.2 📊 Task: Comparing Exponential Smoothing Methods for Internet Usage Forecasting

You are provided with a time series dataset that records the number of users connected to the internet via a server, observed every minute for 100 minutes.

Your goal is to compare the forecasting performance of the three exponential smoothing methods introduced so far:

  1. Simple Exponential Smoothing (SES) — assumes no trend or seasonality

  2. Holt’s Linear Method — incorporates a linear trend

  3. Damped Trend Method — includes a trend that gradually levels off over time

Code
www_usage <- as_tsibble(WWWusage)
www_usage |> autoplot(value) +
  labs(x="Minute", y="Number of users",
       title = "Internet usage per minute")

4.5.3 📈 Observations:

  1. Overall upward trend:
  • The number of users starts around 85 and ends above 200.

  • This suggests a positive long-term trend over the 100-minute window.

  1. Cyclic or wavy pattern:
  • There are multiple rises and falls, especially around minutes:

    • 15–20 (first peak),

    • 40–50 (second peak),

    • sharp drop around minute 60,

    • strong recovery from 70 to 100.

    • These cycles hint at short-term fluctuations, possibly due to user behavior (e.g., sessions starting/stopping).

  1. No clear seasonality:

While there are repeated ups and downs, they do not follow a fixed interval, so we likely have irregular cycles rather than true seasonality.

  1. Sudden shifts:
  • Between minute 50–70, there’s a steep decline, followed by a strong rebound — a sign of nonlinear dynamics or external shocks.

4.5.4 🔁 Time Series Cross-Validation

4.5.5 🧠 What will happen Happening?

4.5.5.1 1. stretch_tsibble(.init = 10)

  • This prepares the data for rolling origin evaluation.

  • It creates multiple expanding windows starting with the first 10 observations.

  • Each window grows by 1 data point and is used to re-fit the models — great for simulating real-time forecasting.

4.5.5.2 2. model(...)

  • Three ETS models (Exponential Smoothing models) are being trained:

SES: Simple Exponential Smoothing (no trend or seasonality).

Holt: Holt’s Linear Trend method (additive trend).

Damped: Damped Trend method (trend is additive but damped over time using a damping parameter φ).

4.5.5.3 3. forecast(h = 1)

  • Forecast 1 step ahead from each window.

    This simulates a real-time forecasting task, where you always predict just the next time point.

4.5.5.4 4. accuracy(www_usage)

  • Compares the predicted values against the actual observed values from the original www_usage data.

    It computes metrics like:

  • MAE (Mean Absolute Error),

  • RMSE (Root Mean Squared Error),

  • MAPE (Mean Absolute Percentage Error),and others.

Code
www_usage |>
  stretch_tsibble(.init = 10) |>
  model(
    SES = ETS(value ~ error("A") + trend("N") + season("N")),
    Holt = ETS(value ~ error("A") + trend("A") + season("N")),
    Damped = ETS(value ~ error("A") + trend("Ad") + season("N"))
  ) |>
  forecast(h = 1) |>
  accuracy(www_usage)
# A tibble: 3 × 10
  .model .type     ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Damped Test  0.288   3.69  3.00 0.347  2.26 0.663 0.636 0.336
2 Holt   Test  0.0610  3.87  3.17 0.244  2.38 0.701 0.668 0.296
3 SES    Test  1.46    6.05  4.81 0.904  3.55 1.06  1.04  0.803
💡

Why It’s Not Standard Cross-Validation?

In regular cross-validation (like k-fold), data can be randomly split because order doesn’t matter.

But for time series, order is everything — we can’t train on future and test on past.

4.5.6Which exponential smoothing method performs best on this time series data?

Answer:
After evaluating the models using time series cross-validation, Damped Holt’s method consistently produced the lowest MAE and RMSE values compared to Simple Exponential Smoothing and Holt’s linear trend method.

4.5.7Why is Damped Holt’s method preferred?

Answer:
Damped Holt’s method includes a damping parameter (φ < 1) that slows down the trend over time, avoiding unrealistic long-term forecasts. This feature makes it more reliable for series with trend but no indefinite growth, like internet usage over a short time window.

4.5.8What’s the next step after identifying the best model?

Answer:
Now that we’ve identified Damped Holt’s method as the most accurate, we apply it to the entire dataset to generate forecasts for future minutes. This ensures we use all available information for the final forecast.

Code
fit <- www_usage |>
  model(
    Damped = ETS(value ~ error("A") + trend("Ad") +
                   season("N"))
  )
# Estimated parameters:
tidy(fit)
# A tibble: 5 × 3
  .model term  estimate
  <chr>  <chr>    <dbl>
1 Damped alpha   1.00  
2 Damped beta    0.997 
3 Damped phi     0.815 
4 Damped l[0]   90.4   
5 Damped b[0]   -0.0173

4.5.9 📊 Model: Damped Holt’s

Parameter Value Meaning
alpha 0.9999 Very high level smoothing — your model gives almost full weight to the most recent observation for updating the level.
beta 0.9966 Also very high — the trend component is very reactive to recent changes.
phi 0.8150 The damping parameter — less than 1, so the trend is being damped, i.e., flattened over time to avoid explosive forecasts.
l[0] 90.35 The estimated initial level of the series at time zero.
b[0] -0.0173 The estimated initial trend — slightly negative, indicating a very small downward slope at the beginning.
Code
gg_tsresiduals(fit)

4.5.10 ✅ Overall Diagnostic Summary:

Criterion Result Comment
Residuals ≈ 0-mean No trend detected.
No autocorrelation ⚠️ | Some lags significant → possible underfitting. | | | | |
Normality Acceptably close.
Ljung-Box test (From earlier: p-value = 0.0012) → Residuals not white noise.
Code
augment(fit) |> features(.resid, ljung_box, lag = 10)
# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 Damped    29.0   0.00125

4.5.11 🔍 Interpretation:

  • Null Hypothesis (H₀): The residuals are uncorrelated (i.e., white noise).

  • Alternative Hypothesis (H₁): There is autocorrelation in the residuals.

4.5.12 ✅ Conclusion:

The p-value is 0.00125, which is very small (< 0.05).

  • ❌ So, we reject the null hypothesis: this suggests that the residuals are not white noise.

  • ➡️ The model (Damped Holt) may not have captured all the patterns in the data.

  • ❌ Cross-validation does not guarantee the optimal model — it only tells you which model performs best among the ones you’ve tried.

Code
fit |>
  forecast(h = 10) |>
  autoplot(www_usage) +
  labs(x="Minute", y="Number of users",
       title = "Internet usage per minute")

✔️ Forecast Behavior: Shows a decreasing trend 📉, flattening due to the damping parameter (0.815).

✔️ Prediction Intervals: Wide intervals 📏 reflect historical data variation

✔️ Method Selection:

  • Easy choice here ✅: Both MSE & MAE favored damped Holt’s method.

  • In other cases ❓: Different accuracy measures may suggest different methods, requiring judgment.

✔️ Forecasting Insight: No single method is best for all cases 🔄—evaluation should be task-specific & frequent 📊.

Holt-Winters Seasonal Methods

4.6 Holt-Winters methods

Holt-Winters methods extend Holt’s linear trend to capture seasonality using three components:
- Level (\(\ell_t\))
- Trend (\(b_t\))
- Seasonal (\(s_t\))

Two variants exist:
1. Additive: Seasonal variations are constant over time.
2. Multiplicative: Seasonal variations scale with series level.


4.7 Holt-Winters Additive Method

Use when seasonal fluctuations are roughly constant (e.g., fixed holiday sales boosts).

4.7.1 Equations:

Forecast:
\[ \hat{y}_{t+h|t} = \ell_t + h b_t + s_{t+h-m(k+1)} \]

Level:
\[ \ell_t = \alpha (y_t - s_{t-m}) + (1-\alpha)(\ell_{t-1} + b_{t-1}) \]

Trend:
\[ b_t = \beta^* (\ell_t - \ell_{t-1}) + (1-\beta^*)b_{t-1} \]

Seasonal:
\[ s_t = \gamma (y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma)s_{t-m} \]

📌 Parameters:
- \(\alpha\): Smooths level
- \(\beta^*\): Smooths trend
- \(\gamma\): Smooths seasonality (\(0 \leq \gamma \leq 1-\alpha\))
- \(m\): Seasonal period (e.g., 12 for monthly data)
- \(k = \lfloor (h-1)/m \rfloor\): Ensures seasonal indices from the final year

  • The seasonal component equation is reparameterized to simplify interpretation.

  • γ in the standard equation is a scaled version of γ∗ , adjusted by the level smoothing parameter α.

  • This scaling ensures the seasonal parameter γ stays within valid bounds, preventing overfitting and maintaining model stability.

Indepth explanation ⬇️⬇️

Seasonal Component Equation Derivation

  1. Original Seasonal Equation:
    The seasonal component is initially expressed as:
    \[ s_t = \gamma^* (y_t - \ell_t) + (1 - \gamma^*) s_{t-m} \]

    • \(\gamma^*\): Seasonal smoothing parameter.
    • \(\ell_t\): Level component at time \(t\).
    • \(s_{t-m}\): Seasonal component from \(m\) periods ago.
  2. Substitute the Level Equation:
    From the Holt-Winters additive method, the level equation is:
    \[ \ell_t = \alpha (y_t - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \]
    Substitute \(\ell_t\) into the seasonal equation:
    \[ s_t = \gamma^* \left[ y_t - \left( \alpha (y_t - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \right) \right] + (1 - \gamma^*) s_{t-m} \]

  3. Simplify the Expression:
    Expand and rearrange terms:
    \[ \begin{align*} s_t &= \gamma^* \left[ y_t - \alpha y_t + \alpha s_{t-m} - (1 - \alpha)(\ell_{t-1} + b_{t-1}) \right] + (1 - \gamma^*) s_{t-m} \\ &= \gamma^* \left[ (1 - \alpha)y_t + \alpha s_{t-m} - (1 - \alpha)(\ell_{t-1} + b_{t-1}) \right] + (1 - \gamma^*) s_{t-m} \\ &= \gamma^*(1 - \alpha)(y_t - \ell_{t-1} - b_{t-1}) + \gamma^* \alpha s_{t-m} + (1 - \gamma^*) s_{t-m} \\ &= \gamma^*(1 - \alpha)(y_t - \ell_{t-1} - b_{t-1}) + \left[ 1 - \gamma^*(1 - \alpha) \right] s_{t-m} \end{align*} \]

  4. Reparameterize with \(\gamma = \gamma^*(1 - \alpha)\):
    Let \(\gamma = \gamma^*(1 - \alpha)\). Substitute into the equation:
    \[ s_t = \gamma (y_t - \ell_{t-1} - b_{t-1}) + (1 - \gamma) s_{t-m} \]
    This matches the standard Holt-Winters seasonal smoothing equation.

  5. Parameter Constraints:

    • Original constraint: \(0 \leq \gamma^* \leq 1\).
    • After substitution: \(0 \leq \gamma = \gamma^*(1 - \alpha) \leq 1 - \alpha\).
    • Ensures the seasonal parameter \(\gamma\) remains bounded and stable.

4.7.2 Key Insight:

The reparameterization ensures the seasonal smoothing parameter \(\gamma\) scales with \(\alpha\), preventing overfitting and maintaining model stability.


4.8 Holt-Winters Multiplicative Method

Use when seasonal fluctuations grow with series level (e.g., holiday sales as a % of revenue).

4.8.1 Equations:

Forecast:
\[ \hat{y}_{t+h|t} = (\ell_t + h b_t) \cdot s_{t+h-m(k+1)} \]

Level:
\[ \ell_t = \alpha \frac{y_t}{s_{t-m}} + (1-\alpha)(\ell_{t-1} + b_{t-1}) \]

Trend:
\[ b_t = \beta^* (\ell_t - \ell_{t-1}) + (1-\beta^*)b_{t-1} \]

Seasonal:
\[ s_t = \gamma \frac{y_t}{\ell_{t-1} + b_{t-1}} + (1-\gamma)s_{t-m} \]

📌 Key Differences:
- Additive: Seasonal component (\(s_t\)) is absolute (e.g., +100 units each December).
- Multiplicative: Seasonal component (\(s_t\)) is relative (e.g., ×1.2 each December).

4.8.2 Example: Domestic overnight trips in Australia

We apply Holt-Winters’ method with both additive and multiplicative seasonality. to forecast quarterly visitor nights in Australia spent by domestic tourists.

the plot shows the data from 1998–2017, and the forecasts for 2018–2020. The data show an obvious seasonal pattern, with peaks observed in the March quarter of each year, corresponding to the Australian summer.

Question 🕯️🕯️

Q: Why might we compare both additive and multiplicative ETS models when forecasting Australian domestic holiday trips?

A: We compare both because:

If tourism increases over time and seasonal fluctuations also grow (e.g., peak seasons become more extreme), a multiplicative model may better capture that behavior. But if seasonality and trends stay stable in size, the additive model might be more appropriate.

❓Using the tourism dataset, filter for holiday-purpose trips and aggregate total trips per quarter (in millions). Then, fit both an additive and a multiplicative ETS model, forecast for the next 3 years, and generate a plot comparing both forecasts. What do you observe about the differences in the two models’ forecasts?

Code
aus_holidays <- tourism |>
  filter(Purpose == "Holiday") |>
  summarise(Trips = sum(Trips)/1e3)
Code
fit <- aus_holidays |>
  model(
    additive = ETS(Trips ~ error("A") + trend("A") +
                                                season("A")),
    multiplicative = ETS(Trips ~ error("M") + trend("A") +
                                                season("M"))
  )
Code
fc <- fit |> forecast(h = "3 years")
fc |>
  autoplot(aus_holidays, level = NULL) +
  labs(title="Australian domestic tourism",
       y="Overnight trips (millions)") +
  guides(colour = guide_legend(title = "Forecast"))

Code
report(fit)
# A tibble: 2 × 9
  .model          sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE    MAE
  <chr>            <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 additive       0.193     -105.  229.  231.  250. 0.174 0.184 0.321 
2 multiplicative 0.00212   -104.  227.  229.  248. 0.170 0.183 0.0328

4.8.3 Holt-Winters’ damped method

Damping is possible with both additive and multiplicative Holt-Winters’ methods. A method that often provides accurate and robust forecasts for seasonal data is the Holt-Winters method with a damped trend and multiplicative seasonality:

Forecast:
\[ \hat{y}_{t+h|t} = \left[ \ell_t + (\phi + \phi^2 + \dots + \phi^h) b_t \right] \cdot s_{t+h-m(k+1)} \]

Level:
\[ \ell_t = \alpha \left( \frac{y_t}{s_{t-m}} \right) + (1-\alpha)(\ell_{t-1} + \phi b_{t-1}) \]

Trend:
\[ b_t = \beta^* (\ell_t - \ell_{t-1}) + (1-\beta^*)\phi b_{t-1} \]

Seasonal:
\[ s_t = \gamma \left( \frac{y_t}{\ell_{t-1} + \phi b_{t-1}} \right) + (1-\gamma)s_{t-m} \]

📌 Parameters:
- \(\phi \in [0.8, 0.98]\): Dampens trend over time
- \(\alpha\): Level smoothing (reactivity to new data)
- \(\beta^*\): Trend smoothing
- \(\gamma\): Seasonal smoothing
- \(m\): Seasonal period (e.g., 7 for daily data with weekly seasonality)

4.8.4 Example: Holt-Winters method with daily data

The Holt-Winters method can also be used for daily type of data, where the seasonal period is m=7, and the appropriate unit of time for h is in days. Here we forecast pedestrian traffic at a busy Melbourne train station in July 2016.

🧠 Task for You:
Using the pedestrian dataset, create a plot showing the 2-week forecast of daily pedestrian counts (in thousands) at Southern Cross Station, using the ETS(M, Ad, M) model.

Make sure to:

  • Filter data from July 2016

  • Use index_by() and summarise() to get daily totals in thousands

  • Forecast 2 weeks ahead

  • Plot actual and forecasted values up to August 14, 2016

  • Title your plot and label the y-axis as: "Pedestrians ('000)"

  • 1. pedestrian |> filter(...)

    We filter the pedestrian dataset to include:

  • Only data from Southern Cross Station

  • Only data from July 1, 2016 onward

2. index_by(Date)

This groups the data by day (i.e., aggregates hourly data into daily totals).
Think of this like creating a time series where each point represents a whole day.

4.8.5 3. summarise(Count = sum(Count)/1000)

This creates a new column Count by:

Summing all hourly pedestrian counts for each day

Dividing by 1000 to express the total in thousands of people
(makes the y-axis easier to read)

4.8.6 4. Filtering for July only

To fit the model, we use only July 2016 data (training window).

4.8.7 5. model(...)

We fit an ETS model with the following components:

error("M"): Multiplicative error (used when variation grows with level)

trend("Ad"): Additive damped trend (growth slows down over time)

season("M"): Multiplicative seasonality (seasonal variation scales with the level)

This model is commonly referred to as Holt-Winters multiplicative with damped trend.

|> forecast(h = "2 weeks")

4.8.8 6. Forecasting

We generate a 2-week ahead forecast (i.e., 14 daily points into August).

4.8.9 7. Plotting

We plot:

Actual observed values up to August 14, 2016

  • Forecasted values from August 1 to 14

  • The y-axis is labeled in thousands of pedestrians

  • Title is descriptive

Code
sth_cross_ped <- pedestrian |>
  filter(Date >= "2016-07-01",
         Sensor == "Southern Cross Station") |>
  index_by(Date) |>
  summarise(Count = sum(Count)/1000)

sth_cross_ped |>
  filter(Date <= "2016-07-31") |>
  model(
    hw = ETS(Count ~ error("M") + trend("Ad") + season("M"))
  ) |>
  forecast(h = "2 weeks") |>
  autoplot(sth_cross_ped |> filter(Date <= "2016-08-14")) +
  labs(title = "Daily traffic: Southern Cross",
       y="Pedestrians ('000)")

Forecasts of daily pedestrian traffic at the Southern Cross railway station, Melbourne.

Clearly the model has identified the weekly seasonal pattern and the increasing trend at the end of the data, and the forecasts are a close match to the test data.

Why Use This Method?

  1. Avoids Over-Forecasting: Damped trend (ϕ) prevents unrealistic long-term growth.

  2. Handles Scaling Seasonality: Multiplicative seasonality adapts to patterns proportional to series level.

  3. Flexible for Daily Data: Works with non-traditional seasonal periods (e.g., m=7 for daily weekly cycles).

5 Exponential Smoothing Methods: Comprehensive Classification

5.1 Overview

Exponential smoothing methods are classified by trend (T) and seasonal (S) components. Each method is labeled as (T,S), where: - Trend (T): N (None), A (Additive), A_d (Additive Damped)
- Seasonal (S): N (None), A (Additive), M (Multiplicative)

This creates 9 possible combinations (Table 8.5). Below, we outline each method’s equations, use cases, and key properties.


5.2 Classification Table (T,S)

Trend (T) Seasonal (S) Method Name Key Equations (Simplified) Use Case Example
N N Simple Exponential Smoothing

\(\hat{y}_{t+h\|t} = \ell_t\)

\(\ell_t = \alpha y_t + (1-\alpha)\ell_{t-1}\)

Stationary data (no trend/seasonality)
N A Seasonal Naïve

\(\hat{y}_{t+h\|t} = s_{t+h-m(k+1)}\)

\(s_t = \gamma(y_t - \ell_{t-1}) + (1-\gamma)s_{t-m}\)

Constant seasonal patterns
N M Multiplicative Seasonal

\(\hat{y}_{t+h\|t} = \ell_t \cdot s_{t+h-m(k+1)}\)

\(s_t = \gamma(y_t / \ell_{t-1}) + (1-\gamma)s_{t-m}\)

Seasonal patterns scaling with level
A N Holt’s Linear Trend \(\hat{y}_{t+h\|t} = \ell_t + h b_t\) \(\ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + b_{t-1})\) \(b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*)b_{t-1}\) Trending data (no seasonality)
A A Additive Holt-Winters \(\hat{y}_{t+h\|t} = \ell_t + h b_t + s_{t+h-m(k+1)}\) \(s_t = \gamma(y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma)s_{t-m}\) Additive trend + fixed seasonality
A M Multiplicative Holt-Winters \(\hat{y}_{t+h\|t} = (\ell_t + h b_t) \cdot s_{t+h-m(k+1)}\) \(s_t = \gamma(y_t / (\ell_{t-1} + b_{t-1})) + (1-\gamma)s_{t-m}\) Trending data + scaling seasonality
A_d N Damped Trend

\(\hat{y}_{t+h\|t} = \ell_t + \phi_h b_t\)

\(\ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})\)

\(b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*)\phi b_{t-1}\)

Long-term forecasts with fading trend
A_d A Damped Additive Holt-Winters \(\hat{y}_{t+h\|t} = \ell_t + \phi_h b_t + s_{t+h-m(k+1)}\)
\(s_t = \gamma(y_t - \ell_{t-1} - \phi b_{t-1}) + (1-\gamma)s_{t-m}\)
Damped trend + fixed seasonality
A_d M Damped Multiplicative HW \(\hat{y}_{t+h\|t} = [\ell_t + \phi_h b_t] \cdot s_{t+h-m(k+1)}\) \(s_t = \gamma(y_t / (\ell_{t-1} + \phi b_{t-1})) + (1-\gamma)s_{t-m}\) Damped trend + scaling seasonality

5.3 Key Equations (General Form)

For all methods, the h-step forecast and smoothing equations depend on the components:

5.3.1 Trend Component (T)

  • None (N): No trend term.
  • Additive (A): \(\ell_t = \alpha \cdot \text{adjusted } y_t + (1-\alpha)(\ell_{t-1} + b_{t-1})\)
  • Additive Damped (A_d): \(\ell_t = \alpha \cdot \text{adjusted } y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})\)

5.3.2 Seasonal Component (S)

  • None (N): No seasonal term.
  • Additive (A): \(s_t = \gamma \cdot \text{residual} + (1-\gamma)s_{t-m}\)
  • Multiplicative (M): \(s_t = \gamma \cdot \text{ratio} + (1-\gamma)s_{t-m}\)

5.3.3 Parameters

  • \(\alpha\) (level), \(\beta^*\) (trend), \(\gamma\) (seasonality): \(0 \leq \alpha, \beta^*, \gamma \leq 1\)
  • \(\phi\) (damping): \(0.8 \leq \phi \leq 0.98\)
  • \(m\): Seasonal period (e.g., \(m=12\) for monthly data).

5.4 Notes

  1. Multiplicative Trend Exclusion: Methods with multiplicative trends (e.g., \(T=M\)) are excluded due to poor forecast reliability.
  2. Optimization: Parameters are estimated by minimizing SSE (sum of squared errors).
  3. Historical Context:
    • Pegels (1969): First classification framework.
    • Gardner (1985): Extended to include damped trends.
    • Hyndman et al. (2008): Comprehensive review in Forecasting with Exponential Smoothing.

5.5 When to Use Which Method?

5.5.1 Table 8.5: A Two-Way Classification of Exponential Smoothing Methods

Trend Component ↓

 Seasonal Component →

N (None) A (Additive) M (Multiplicative)
N (None) (N,N) (N,A) (N,M)
A (Additive) (A,N) (A,A) (A,M)
Aₐ (Additive Damped) (Ad,N) (Ad,A) (Ad,M)
Data Characteristics Recommended Method
No trend/seasonality (N,N) Simple Exponential Smoothing
Additive trend only (A,N) Holt’s Linear Trend
Additive trend + fixed seasonality (A,A) Additive Holt-Winters
Damped trend + scaling seasonality (A_d,M) Damped Multiplicative HW

Forecasting 😂😂

Code
aus_holidays <- tourism |>
  filter(Purpose == "Holiday") |>
  summarise(Trips = sum(Trips)/1e3)
fit <- aus_holidays |>
  model(ETS(Trips))
report(fit)
Series: Trips 
Model: ETS(M,N,A) 
  Smoothing parameters:
    alpha = 0.3484054 
    gamma = 0.0001000018 

  Initial states:
     l[0]       s[0]      s[-1]      s[-2]    s[-3]
 9.727072 -0.5376106 -0.6884343 -0.2933663 1.519411

  sigma^2:  0.0022

     AIC     AICc      BIC 
226.2289 227.7845 242.9031 

Smoothing parameters:

  • alpha = 0.348: controls how much the level is updated. A value of roughly 0.35 indicates that the level is moderately adjusted based on the most recent observation.

  • mma = 0.0001: very low seasonal smoothing → seasonality is almost fixed

Initial states:

Level (l[0]) and seasonal components (s[0], s[-1], …) are initialized

sigma² = 0.0022: model variance (lower is better)

AIC / BIC values: for model comparison (lower = better)

Code
fit |>
  forecast(h = 8) |>
  autoplot(aus_holidays)+
  labs(title="Australian domestic tourism",
       y="Overnight trips (millions)")

Note

ETS does automatically select the optimal structure based on your data, but understanding the components and parameters is essential for interpreting your forecasts and verifying that the chosen model makes sense in the context of your series.

 

A work by Dr. BENGHALEM Abdelhadi

abdelhadi.benghalem@gmail.com